Performance study on CUDA GPUs for parallelizing the local ensemble transformed Kalman filter algorithm
نویسندگان
چکیده
Modern graphics cards provide computational capabilities that exceed current CPUs. As one of the computational intensive problems, numerical weather prediction (NWP) has the opportunity to benefit from the massive number of threads and large memory throughput in the graphics architecture. In this paper, we present the key steps to integrate the CUDA programming framework for one key component in NWP, the data assimilation algorithm, which incorporates the observational data into the model to produce the best initial condition in the next prediction. The data assimilation algorithm we studied in this paper exhibits good localization and favors parallelism. To maximize the throughput of the graphics card, over a million CUDA threads, global memory coalescing, and fast graphics shared memory are utilized. We also demonstrate the differences in the advancement of GPU architectures from the GTX 200 series to Fermi. The experiments are carried out separately on a GTX 260 (GTX 200 series) and a GTX 460 (Fermi) graphics card. Results show an improvement of 72.1× speed-up running on the GTX 260 and 92.7× speed-up on the GTX 460. The results provide attractive evidence for applying CUDA GPUs to high demanding scientific computation realms. Copyright c © 2010 John Wiley & Sons, Ltd.
منابع مشابه
An approach to Improve Particle Swarm Optimization Algorithm Using CUDA
The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the pr...
متن کاملAccelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کاملEstimation of LOS Rates for Target Tracking Problems using EKF and UKF Algorithms- a Comparative Study
One of the most important problem in target tracking is Line Of Sight (LOS) rate estimation for using from PN (proportional navigation) guidance law. This paper deals on estimation of position and LOS rates of target with respect to the pursuer from available noisy RF seeker and tracker measurements. Due to many important for exact estimation on tracking problems must target position and Line O...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملDistance Dependent Localization Approach in Oil Reservoir History Matching: A Comparative Study
To perform any economic management of a petroleum reservoir in real time, a predictable and/or updateable model of reservoir along with uncertainty estimation ability is required. One relatively recent method is a sequential Monte Carlo implementation of the Kalman filter: the Ensemble Kalman Filter (EnKF). The EnKF not only estimate uncertain parameters but also provide a recursive estimat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 24 شماره
صفحات -
تاریخ انتشار 2012